Distributional Similarity Models: Clustering vs. Nearest Neighbors
نویسنده
چکیده
Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distributional models of unseen events.
منابع مشابه
An Adaptive Spectral Clustering Algorithm Based on the Importance of Shared Nearest Neighbors
The construction of a similarity matrix is one significant step for the spectral clustering algorithm; while the Gaussian kernel function is one of the most common measures for constructing the similarity matrix. However, with a fixed scaling parameter, the similarity between two data points is not adaptive and appropriate for multi-scale datasets. In this paper, through quantitating the value ...
متن کاملCombining Syntactic Co-occurrences and Nearest Neighbours in Distributional Methods to Remedy Data Sparseness.
The task of automatically acquiring semantically related words have led people to study distributional similarity. The distributional hypothesis states that words that are similar share similar contexts. In this paper we present a technique that aims at improving the performance of a syntax-based distributional method by augmenting the original input of the system (syntactic co-occurrences) wit...
متن کاملImprovement of Jarvis-Patrick Clustering Based on Fuzzy Similarity
Different clustering algorithms are based on different similarity or distance measures (e.g. Euclidian distance, Minkowsky distance, Jackard coefficient, etc.). Jarvis-Patrick clustering method utilizes the number of the common neighbors of the k-nearest neighbors of objects to disclose the clusters. The main drawback of this algorithm is that its parameters determine a too crisp cutting criter...
متن کاملFLAG: Fast Large-Scale Graph Construction for NLP
Many natural language processing (NLP) problems involve constructing large nearest-neighbor graphs between word pairs by computing distributional similarity between word pairs from large corpora. In this paper, first we describe a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data in memory and time efficient manner, FLAG maintains...
متن کاملSpatio-Temporal Outlier Detection Technique
Outlier detection is very important functionality of data mining, it has enormous applications. This paper proposes a clustering based approach for outlier detection using spatio-temporal data. It uses three step approach to detect spatiotemporal outliers. In the first step of outlier detection, clustering is performed on the spatio-temporal dataset with proposed Spatio-Temporal Shared Nearest ...
متن کامل